Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

100

Binary Neural Architecture Search

Sample without

Replacement & Train

Compute

EP(A - A )+ A

Child / Parent

Reduce

search space

K Times

Minimum

FIGURE 4.4

The main framework of the proposed Child-Parent search strategy. In a loop, we ﬁrst sample

the operation without replacement for each edge of the search space and then train the child

and parent models generated by the same architecture simultaneously. Second, we use the

Eqs. 4.15 and 4.28 to compute the evaluation indicator calculated by the accuracy of both

models on the validation data set. Until all operations are selected, we remove the operation

on each edge with the worst performance.

loss between child and parent networks. It is observed that the worst operations in the early

stage usually have worse performance in the end. On the basis of this observation, we then

remove the operation with the worst performance according to the performance indicator.

This process is repeated until only one operation is left on each edge. We reformulate the

traditional loss function as a kernel-level Child-Parent loss for binarized optimization of

child-parent model.

4.3.1

Child-Parent Model for Network Binarization

Network binarization calculates neural networks with 1-bit weights and activations to ﬁt

the full-precision network and can signiﬁcantly compress deep convolutional neural networks

(CNNs). Previous work [287] usually investigates the binarization problem by exploring the

full-precision model to guide the optimization of binarized models. Based on the investi-

gation, we reformulate NAS-based network binarization as a Child-Parent model as shown

in Fig. 4.5. The child and parent models are the binarized model and the full-precision

counterpart, respectively.

Conventional NAS is ineﬃcient due to the complicated reward computation in network

training, where the evaluation of a structure is usually done after the network training

converges. There are also some methods to perform the evaluation of a cell during network

training. [292] points out that the best choice in the early stages is not necessarily the ﬁnal

optimal one; however, the worst operation in the early stages usually performs poorly in the

end. And this phenomenon will become more and more signiﬁcant as training progresses. On

the basis of this observation, we propose a simple yet eﬀective operation-removing process,

which is the key task of the proposed CP-model.

Intuitively, the diﬀerence between the ability of children and parents and how much

children can independently handle their problems are two main aspects that should be

considered to deﬁne a reasonable performance evaluation measure. Our Child-Parent model

introduces a similar performance indicator to improve search eﬃciency. The performance

indicator includes two parts, the performance loss between the binarized network (child) and

the full-precision network (parent), and the performance of the binarized network (child).